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Abstract 

Energy-efficient real-time task scheduling has been actively 
explored in the past decade. Different from the past work, 
this paper considers schedulability conditions for stochastic 
real-time tasks. A schedulability condition is first presented 
for frame-based stochastic real-time tasks, and several algo- 
rithms are also examined to check the schedulability of a 
given strategy. An approach is then proposed based on the 
schedulability condition to adapt a continuous-speed-based 
method to a discrete-speed system. The approach is able 
to stay as close as possible to the continuous-speed-based 
method, but still guaranteeing the schedulability. It is shown 
by simulations that the energy saving can be more than 20% 
for some system configurations. 

Keywords: Stochastic low-power real-time scheduling, 
frame-based systems, schedulability conditions. 

1 Introduction 

In the past decade, energy efficiency has received a 
lot of attention in system designs, ranged from server 
farms to embedded devices. With limited energy 
supply but an increasing demand on system perfor- 
mance, how to deal with energy-efficient real-time task 
scheduling in embedded systems has become a highly 
critical issue. There are two major ways in frequency 
changes of task executions: Inter-task or intra-task 
dynamic voltage scaling (DVS). Although Intra-task 
DVS seems to save more energy, the implementation 
is far more complicated than Inter-task DVS. Most of 
the time we need very good supports from compilers 
or / and operating systems, that is often hard to receive 
for many embedded systems. On the other hand, inter- 
task DVS is easier to deploy, and tasks might not be 
even aware of the deployment of the technology. 

Energy-efficient real-time task scheduling has been 
actively explored in the past decade. Low-power real- 
time systems with stochastic or unknown duration 
have been studied for several years. The problem has 
first been considered in systems with only one task, 



or systems in which each task gets a fixed amount of 
time. Gruian J3J. |U or Lorch and Smith (5) |6| both 
shown that when intra-task frequency change is avail- 
able, the more efficient way to save energy is to in- 
crease progressively the speed. Solutions using a dis- 
crete set of frequencies and taking speed change over- 
head into account have also been proposed ITTI ITOI. For 
inter-task frequency changes, some work has been al- 
ready undertaken. In |7J, authors consider a similar 
model to the one we consider here, even if this model is 
presented differently. The authors present several dy- 
namic power management techniques: Proportional, 
Greedy or Statistical. They don't really take the dis- 
tribution of number of cycles into account, but only its 
maximum, and its average for Statistical. According to 
the strategy, a task will give its slack time (the differ- 
ence between the worst case and the actual number of 
used cycle) either to the next task in the frame, or to all 
of them. In [1], authors attempt to allow the manager to 
tune this aggressiveness level, while in [10J, they pro- 
pose to adapt automatically this aggressiveness using 
the distribution of the number of cycles for each task. 
The same authors have also proposed a strategy taking 
the number of available speeds into account from the 
beginning, instead of patching algorithms developed 
for continuous speed processors 0. Some multipro- 
cessor extensions have been considered in |2|. 

Although excellent research results have been pro- 
posed for energy-efficient real-time task scheduling, 
little work is done for stochastic real-time tasks, where 
the execution cycles of tasks might not be known in 
advance. In this paper, we are interested in frame- 
based stochastic real-time systems with inter-task DVS, 
where frame-based real-time tasks have the same 
deadline (also referred as the frame). Note that the 
frame-based real-time task model does exist in many 
existing embedded system designs, and the results of 
this paper can provide insight in the designs of more 
complicated systems. Our contribution is twofold: 
First, we propose a schedulability test, allowing to eas- 
ily know if a frequency selection will allow to meet 



deadlines for any task in the system. As a second con- 
tribution, we provide a general method allowing to 
adapt a method designed for a continuous set of speeds 
(or frequencies) into a discrete set of speeds. This can 
be done more efficiently than classically by using the 
schedulability condition we give in the first part. Apart 
from this alternative way of adapting continuous strat- 
egy we will show how this schedulability test can be 
used in order to improve the robustness to parame- 
ters variation. The capability of the proposed approach 
is demonstrated by a set of simulations, and we show 
that the energy saving can be more than 20% for some 
system configurations. 

The rest of this paper is organized as follows: we first 
present the mathematical model of a real-time system 
that we consider in Section|2] We then present our first 
contribution in Section [3] which consists in schedula- 
bility conditions and tests for the model. We then use 
those results in Section 3.5 and|4]to explain how we can 
improve the discretization of continuous-speed-based 
strategies, and show the efficiency of this approach in 
the experimental part, in Section |5j and finally con- 
clude in Section [6] 



2 Model 

We have N tasks {Ti, i e [1, . . . , N]} which run on a 
DVS CPU. They all share the same deadline and period 
D (which we call the frame), and are executed in the 
order T\, T 2 , ■ ■ ■ , Tjy. The maximum execution num- 
ber of cycles of T, is u>j. Task Tj will require x cycles 
with a probability Ci(x), where Ci(-) is then the distri- 
bution of the number of cycles. Of course, in practical, 
we cannot use a so precise information, and authors 
usually group cycles in "bins". For instance, we can 
choose to use a fixed bin system, with bi the size of the 
bins. In this case, the probability distribution c^(-) is 
such that c[ (k) represent the probability to use between 
(k — 1) x bi (excluded) and k x bi (included) cycles. 

The system is said to be expedient if a task never waits 
intentionally. In other words, T\ starts at time 0, T 2 
starts as soon as Ti finishes, and so on. 

The CPU can run at M frequencies (or speeds) /1 < 
f% < ■ ■ ■ < fut and the chosen frequency does not 
change during task execution. The mode j consumes 
Pj Watts. 

We assume we have N scheduling functions Si(t) for 
i € [1, . . . , N] and t 6 [0,1?]. This function means that 
if Ti starts its execution at time t, it will run until its 
end at frequency Si(t), where Si(t) e f%, /m}- 
Si {t) is then a step function (piece-wise constant func- 
tion), with only M possible values. Remark that Si(t) 
is not necessarily an increasing or a monotonous func- 
tion. This model generalizes several scheduling strate- 
gies proposed in the literature, such as (8j [TOl - where 
they consider a function corresponding to Si(D — t) -, 
or discrete versions of 0. Figure [T] shows an example 



of such scheduling function set. 

A scheduling function can be represented by a set 
of points (black dots on Figure ml, representing the be- 
ginning of the step. | Si | is the number of steps 
of Si. Si[k],k e {1,...,| Si |} is one point, with 
Si[k].t being its time component, and Si[k].f the fre- 
quency. Si has then the same value Si [k].f in the inter- 
val Si[k].t, Si[k + l].t (with Si[\ S l | +l].t = 00), and 
we have Si(t) = Si[k].f, where 

k = max{j e{l,...,\Si\}:Si\j].t<ty 

Notice that finding k can be done in C(log | Si |) (by bi- 
nary search), and, except in the case of very particular 
models, | S z \< M. 

We first assume that changing CPU frequency does 
not cost any time or energy. See Section 4.1 for exten- 
sions. 

The scheduling functions Si (t) can be pretty general, 
but have to respect some constraints in order to ensure 
the system schedulability and avoid deadline misses. 

Figure 1 Example of scheduling with function Si(t). 
We have 5 tasks Ti, . . . , T5, running every D. In this 
frame, Ti is run at frequency /i = Si(ti), T 2 at f 2 = 
S 2 (t 2 ),T 3 at U = S 3 (t 3 ), etc 
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We need now to define the concept of schedulability 
in our model: 

Definition 1. An expedient system {T i5 Si(-)},{fj}(i € 
{1, . . . , N},j G {1, • ■ • , M}) is said to be schedulable if 
whatever the combination of effective number of cycles for 
each task, any task Ti finishes its execution no later than the 
end of the frame. 

From this definition, we can easily see that if {Ti} is 
such that Y2iLi w i > D ( m e left hand size repre- 
sents the time needed to run any task in the frame at 
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the highest speed if every task requires its worst case 
execution cycle), the system will never be schedulable, 
whatever the set of scheduling functions. In the same 
way we can see that if {Ti} is such that -A- Yli=i w i — 
D, the system is always schedulable, even with a "very 
bad" set of scheduling functions. 

Of course, a non schedulable system could be able 
to run its tasks completely in almost every case. Be- 
ing non schedulable means that stochastically certainly 
(with a probability equal to 1), we will have a frame 
where a task will not have the time to finish before the 
deadline (or the end of the frame) 

3 Schedulability and Discretiza- 
tion 

3.1 Danger Zone 

Lemma 1. Any task in {Ti, T i+ i, . . . , Tn} can always fin- 
ish no later than D if and only if the system is expedient, and 
Ti starts no later than Zi, defined as 
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Proof. This lemma can be proved by induction. 
Initialization. We first consider the case Tn- The very 
last time the task Tjv can start is the time allowing it to 
end before D even if it consumes its wn cycles. At the 

highest frequency Jm, T n takes at most V -^ 1 - to finish. 

JM 

Tjv has then necessarily to start no later than D — - — . 

JM 

Otherwise, if the task starts after that time, even at the 
highest frequency, there is no certitude that T N will fin- 
ish by D. 



3.2 Schedulability Conditions 

Let us now consider conditions on {Si} allowing to 
guarantee the schedulability of the system. We prove 
the following theorem: 



Theorem 1. 

Si{t) > 

where 
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Vt € [l,...,N],te [0,Zil 
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Zi = D- y-Vw fc , 
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is a necessary and sufficient condition in order to guarantee 
that if task Ti does never require more than Wi cycles and 
the system is expedient, any task Ti can finish no later than 
z i+ i, and then the last one Tn no later than D. 

Proof. We show this by induction. Let Ti be the worst 
finishing time of task Ti . Please note that this does not 
necessarily correspond to the case where any task be- 
fore Ti consumes its WCEC. Figure |2]highlights why. 

Figure 2 Example showing that a shorter number of 
cycles for one task can result in a worse ending time for 
subsequent tasks. Here, t' is the point at which 5*2 (t) 
goes from fi to f-2- On the top plot, Ti uses slightly less 
cycles than in the bottom plot, and Ti uses the same 
number in both cases, but is run at fi in the first case, 
and at fi in the second one. 
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Induction. We know that if (and only if) Tj + i starts no 
later than z i+1 , the schedulability of {T i+ i, . . . , Tn} is 
ensured. We need then to show that if Ti starts no later 
than Zi, it will be finished by If Ti starts no later 
that Zi, we can choose the frequency in order that Tj 
finishes before 
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Definition 2. The danger zone ofTi is the range ]z i7 D]. 

This danger zone means that if Tj has to start in 
}zi,D], we cannot guarantee the schedulability any- 
more. Even if, because of the variable nature of execu- 
tion time, we cannot guarantee that some task will miss 
its deadline. Of course, the size of the danger zone of 
Ti is larger that the one of Tj if i < j, which means that 
Zi < Zj iff i < j. 

In order to simplify some notation, we will state 
z N +i = D. 



First, we have to show that in the range [0,2,], 
vj ■ 

— < /a/- As this function is an increasing func- 

Zi+l — t 

tion of t, we just need to consider the maximal value 
we need: 
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Initialization. For the initialization, we consider T\. 
Clearly, as the execution length is not taken into ac- 
count for the frequency selection, the worst case occurs 
when T\ uses w\ cycles. As T\ starts at time 0, we have 
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AsSi(i) > 



Z2~t 



by hypothesis, we have 



n < 



Z-2 



Figure 3 Set of limit functions £i(t), for an example of 
4 tasks. DZ represents the Danger Zone of T4. 
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Ti ends then no later than z 2 in any case. Similarly 

we have that if Si(t) < —— 1 — , t\ > z 2 , and we cannot 

z 2 -t 

guarantee that T\ finishes no later than z 2 

Induction. Let us now consider T, with i > 1. We 
know by induction that Tj_i finished its execution be- 
tween time and time Zj. Let be this end time. Know- 
ing that task Tj starts at 0, the worst case for Tj is to use 
Wi cycles. The worst end time of T is then 

„ . Wi 




Frame length (Deadline) 
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St(0) 

(which is possible, be- 



withtf e [0,Tj_i = Zi]. 
Then, as Si(t) > - 

Zi+l - t 

cause we have just shown that the right hand side is 
not higher than Jm in the range we have to consider), 
we have 
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We then have that if S l {t) > 



3.3 Discrete Limit 

The closest scheduling functions set to the limit is 

Si(t) = min {/ G . . . , f N } : / > d(t)} . 
Informally, we could write this function Si(t) = 
— , where \w\ stands for "the smallest available 

Zi+l - t 

frequency not lower than x" . This function varies as a 

Wi 



z i + l ^ t 



, task T finishes discrete hyperbola between 



always no later than z i+ i, and then, as a consequence, 

that any task finishes no later than z^ + i = D. 

Symmetrically, we can show also that if Si(t) < 
w ■ 

% — , then r, is higher than z i+1 , and then tjv is 

Zi+i -t 

higher than D, and the system is not schedulable. □ 

Remark that the expedience hypothesis is a little bit 
too strong. It would be enough to require that T never 
waits intentionally later than z- t . T\ doesn't even have 
to start at time 0, as soon as it starts no later that z\. 
With this hypothesis, the initialization would be: in the 
worst case, T\ would start at time 8, somewhere be- 
tween and Z\ and use w% cycles. In this case, it would 
end at 

n = e + ^<e + ^ = z 2 



Zi+l 



and 
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Zi+l — Zi 




fu 



= \fu\ = I 



Si{9) 



Z2—i 



and we know that the CPU can be set to the speed j^zg, 
which is not higher than Jm because 9 is in [0, z\]. 

Definition 3. We denote by £i(t) the schedulability limit, 
or 

Til: 



Zi+l - t 



where 



N 



z,. = D - — ^2 Wk- 



Al- 



ibis function is however in general not very effi- 
cient: Ti is run at the slowest frequency allowing to 
still run the following jobs in the remaining time. But 
then, T\ is run very slowly, while {T 2 , . . . , TV} have a 
pretty high probability to run at a high frequency. A 
more balanced frequency usage is often better. 

This strategies actually corresponds to the Greedy 
technique (DPM-G) described by Mosse et al. 0, ex- 
cept that they consider continuous speeds. 

Building such a function is very easy, and is in O(M) 
for each task, with the method given by Algorithm [T] 
We mainly need to be able to inverse C: = 
Z1+1 — y~ 

Algorithm 1 Building Limit, worst case scheduling 

functions. (a) + means max{0, a}. 

z^D 

foreach i e {TV, . . . , 1} do 
S t ± (0,/i) 

foreach j e {2, . . . , M } do 



An example of such schedulability limits is given in 
Figure |3j with four tasks, and a maximum frequency of 
1000MHz. 



fi-i 



h 



In the following, this strategy is named as Limit. 
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3.4 Checking the schedulability 

Provided a set of scheduling functions {S}, checking 
its schedulability is pretty simple. As we know that the 
limit function is non decreasing, we just need to check 
that each step of Si is above the limit. This can be done 
with the following algorithm. 



Algorithm 2 Schedulability check 

z <- D 

foreach % e {N, . . . , 1} do 

foreach k € {2, . . . , | Si |} do 

L return false 

Z«- Z- fi- 

_ JM 

return true 



This check can then be performed in O (X)i=i | S% |J 
which, is Si is non decreasing (which is almost always 
the case) is lower than 0(N x M). 

This test can be used offline to check the schedula- 
bility of some method or heuristic, but can also be per- 
formed as soon as some parameter change has been de- 
tected. For instance, if the system observes that a task 
Ti used more cycles than its (expected) WCEC Wi, the 
test could be performed with the new WCEC in order 
to see if the current set of S functions can still be used. 
Notice that we only need to check tasks between 1 and 
i, because the schedulability of tasks in {i + 1, . . . , TV} 
does not depend upon Wi. See Section [6] about future 
work for more details. 

3.5 Using Schedulability Condition to Dis- 
cretize Continuous Methods 



Figure 4 Two different ways of discretizing a continu- 
ous strategy: Discr. strat. 1 rounds up to the first avail- 
able frequency. Discr. strat. 2 (our proposal) uses the 
closest available frequency, taking the limit into ac- 
count. Limit is the strategy described by Algorithm!!] 
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There are mainly two ways of building a set of S- 
functions for a given system. The first method consists 
in considering the problem with continuous available 
frequencies, and by some heuristic, adapting this re- 
sult for a discrete speeds system. The second method 
consists in taking into account from the beginning that 
there are only a limited number of available speeds. 
The second family of methods has the advantage of be- 
ing usually more efficient in terms of energy, but the 
disadvantage of being much more complex, requiring 
a non negligible amount of computations or memory. 
This is not problematic if the system is very stable and 
its parameters do not change often, but as soon as some 
on-line adaptation is eventually required, heavy and 
complex computations cannot be performed anymore. 

In the first family, the heuristic usually used con- 
sists in computing a continuous function Sf(t) which 
is build in order to be schedulable, and to obtain a 
discrete function by using for any t the smallest fre- 
quency above Sf(t), or Si(t) — \Sf(t)~\. However, this 
strategy is often pessimistic. But so far, there were 
no other method in order to ensure the schedulability. 
This assertion is not valid anymore, because we pro- 
vided in this paper a schedulability condition which 
can be used. 

The main idea is, instead of using the smallest fre- 
quency above Sf(t), to use the closest frequency to 
Sf(t), and, if needed, to round this up with the schedu- 
lability limit Ci{t). In other words, we will use: 

S 4 (i)=ma X {rS 4 c (*)J,rA(i)l}. 

The advantage of this technique is that we have more 
chance to be closer to the continuous function (which 
is often optimal in the case of continuous CPU). How- 
ever, both techniques (ceiling and closest frequency) 
are approximations, and none of them is guaranteed 
to be better than the other one in any case. As we will 
show in the experimental section, there are systems in 
which the classical discretization is better, but there are 
also many cases where our discretization is better. 

Algorithm [3] shows how step functions can be ob- 
tained. For each task, computing its function is in 
0(M x A), where A is the complexity of computing 
S^ 1 (f). According to the kind of continuous method 
we use, A can range between 1 (if S c r 1 (f) has a con- 
stant closed form) and \og{D/e) x B, with a binary 
search, where e is the desired precision, and B the com- 
plexity of computing Sf(t). 

Actually, computing the closest frequency 
amongst {/i, f 2 , ■ ■ ■ , fu} roughly boils down to 
compute the round up frequency amongst the set 

{h±h i M&, . . , , fM-i+fM y -j^g^ the range cor . 

responding to fl ^ 2 is mapped onto f%, etc. In 
Algorithm |3j if we simply use fj-i instead of /, we 
obtain the classical round up operation. 
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Algorithm 3 Algorithm computing the closest step- 
function to Sf(-), respecting the schedulability limit 
Ci(-). (a) + stands for max{0, a}. 

foreach i e {N, .... 1} do 

Si +- (0, fx) 

foreach j e {2, . . . , M} do 

f<-(fj-i + f<)/2 
t^min{5V f (/),A _1 (/i-i)} 



4 Model Extensions 

4.1 Frequency Changes Overhead 

Our model allows to easily take the time penalty of fre- 
quency changes into account. Let Px(fi,fj) be the time 
penalty of changing from /j to fj . This means that once 
the frequency change is asked (usually a special regis- 
ter has been set to some predefined value), the proces- 
sor is "idle" during Pr(fi, fj) units of time before the 
next instruction is run. We assume that the worst time 
overhead is when the CPU goes from fx to /m- We de- 
note for this P^ 1 = maxjj- P T (fi, fj) = PtUxJm)- 

Notice that this model is rather pessimistic: on mod- 
ern DVS CPUs, the processor does not stop after a 
change request, but still run at the old frequency for a 
few cycles before the change becomes effective. How- 
ever, even if the processor never stops, there is still 
a penalty, but the time penalty is negative when the 
speed goes down (because the job will be finished 
sooner than if the frequency change had been per- 
formed before it started). Then as a first approxima- 
tion, we could consider that negative penalties com- 
pensate positive penalties. But this approximation 
does not hold for energy penalties, because all of them 
are obviously positive. 

We want also to take the switching time before jobs 
into account, even if there is no frequency change (we 
assume that the job switching time is already taken into 
account in Pp). Let Sr(fi) be the switching time when 
the frequency is /j, and is not changed between two 
consecutive jobs. Again, let denote Sr(/iu)- Usu- 
ally, we have St (ft) < SrC/j) if fi > fj- We made here 
the simplifying hypothesis that the switching time is 
job independent, which is an approximation since this 
time usually depends upon the amount of used mem- 
ory. However, in our purpose, we only need to con- 
sider an upperbound of this time. 

As before, we know that Tjy must start no later than 
D-j^.lfT N starts at this limit (and even before), the 
selected frequency must be /m- Then we could have 
two situations: 

• Best case: the previous tasks Tjv-i was already 
running at fu- Then 7jy_i needs to finish before 
the start limit for Tjy, minus the switching time, 
then D - ^ - S^; 



• Worst case: the previous tasks JV-i was not run- 
ning at /m, we need then to change the frequency. 
In the worst case, the time penalty will be P^ 1 . 
Tjv_i needs then to finish no later than D — ^ — 

JM 

pM 
T ■ 

The first limit is then a necessary condition, and the 
second, a sufficient condition to ensure the schedu- 
lability of Tm- Similarly, we can see that Tj must 
start before zf to ensure the schedulability of itself 
and any subsequent task (necessary condition), and 
this schedulability is ensured (sufficient condition) if 
Ti starts before zf, where z" and z* are defined as: 

1 N 

z? = D-—Y w k -(N-i+l)S¥ = Zi -(N-i+l)S¥ 
and 

1 N 

4 = D-—Y Wk -(N-i+l)P™ = Zi -{N-i+l)P^ 

JM 

We can then provide two schedulability conditions: 

• Necessary condition: Si(t) > — — 1 — ; 

Z i+1 ~ * 

• Sufficient condition: SAt) > 3 Wi . . 

z i + l 1 

Algorithm [3] can easily be adapted using those con- 
ditions. We use then CAt) = — . 

7 s — t 
Z i+1 1 

4.2 Soft Deadlines 

If we want to be a little bit more flexible, we could pos- 
sibly consider soft deadlines, and adapt our schedula- 
bility condition consequently. The main idea is to not 
consider the WCEC, but to use some percentile: if (e) 
is such that P[c, < Kj(e)] > 1 — e, where q is the actual 
number of cycles of T ir we can use «i(e) as a worst case 
execution time. 

However, it seems to be almost impossible to com- 
pute analytically the probability of missing a dead- 
line with this model. It would boil down to compute 
V[Ei + E 2 + E 3 + ... + E N ] where E { represents the ex- 
ecution time of jobs of task Tj. E{ depends then upon 
the job length distribution, but also upon the speed at 
which T{ is run, which depends upon the time at which 
Tj_i ends ... which depends upon the time Ti_ 2 ended, 
and so on. As E/s are not independent, it seems then 
that we cannot use the central limit theorem. 

If we accept an approximation of the failure prob- 
ability, we could do in the following way. Let Cj be 
the random variable giving the number of cycles of 
T. u and C = ^ l C»- Let W = £\ w, be the maximal 
value of C (the frame worst case execution cycle). Let 
C e = min c {P[C < c] > 1 - e}. 

W 

We assume that using the deadline D— will allow 
to respect deadlines with a probability close to 1 — e. 
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Those propositions are only heuristics, and should re- 
quire more work, both analytic and experimental. 

5 Experimental Results 

In order to evaluate the advantage of using a "closest" 
approach instead of an "upper bound" approach, we 
applied it on two methods. The first is one described 
by Mosse et al. in 0, and is called DPM-S (Dynamic 
Power Management-Statistical), and the second one is 
described by described by Xu, Melhem and Mosse 1 10 1, 
called PITDVS (Practical Inter-Task DVS). 

5.1 DPM-S 

The method DPM-S described in [7J bets that the next 
jobs will not need more cycles than their average, and 
compute then the speed making this assumption when 
a job starts. Of course, the schedulability limit is also 
taken into account. In their paper, the authors consider 
that they can use any (normalized) frequency between 
and 1. In order to apply this method on a system with 
a limited number of frequencies, we can either round 
them up, or use or "closest" approach. They don't take 
frequency change overheads into account, but accord- 
ing to what we claimed hereabove, those overheads are 
easy to integrate. 

We compute now the two following step functions in 
this way, where avgi stands for the average number of 
cycles of T, : in Algorithm [3] adapted to take frequency 
changes overhead into account (cf Section 4. 1 1, 



DPM-S up : we replace S^ 1 by 



D 



T,j=i av 9i 
fj-i 



(1) 



DPM-S close5t : we replace S^ 1 by 



T,j=i av 9i 
f 



D — 



(2) 



5.2 PITDVS 



The second method we consider, by Xu, Melhem 
and Mosse QO), is called PITDVS (Practical Inter-Task 
DVS), and aims at patching OITDVS (Optimal Inter- 
Task DVS |9J), an optimal method for ideal proces- 
sors (with a continuous range of available frequen- 
cies). They apply several patches in order to make this 
optimal method usable for realistic processors. They 
start by taking speed change overhead into account, 
then they introduce maximal and minimal speed (OIT- 
DVS assumes speed from to infinity), and finally, 
they round up the 5-function to the smallest available 
frequency. It is in this last patch that we apply our 
technique. Using the fa value described in |10| (rep- 
resenting the aggressiveness level), we compute the 



step functions in the following way: in Algorithm [3] 
adapted to take frequency changes overhead into ac- 
count (cf Section|4~T), 



(3) 



• PITDVS up (in liTOln: we replace S7 1 by 

D-P T x(N-i) 



ft/; 



3-1 



PITDVS dosest (our adaptation): we replace S^ 1 by 



D — Pt x [N — i) 



far 



(4) 



In the following, we also run simulations using C 
(Limit) to choose the frequency. Our aim was not to 
show how efficient or how bad this technique is, but 
more to show that often, we observe rather counterin- 
tuitive results. 

5.3 Workloads and Simulation Architec- 
ture 

For the simulations we present bellow, we use two dif- 
ferent sets of workloads. The first one is pretty simple, 
and quite theoretical. We use a set of 12 tasks, each of 
them having lengths uniformly distributed, between 
miscellaneous bounds, different from each other. For 
the second set of simulations, we used several work- 
loads coming from video decoding using H.264, which 
is used in our lab for some other experiments on a TI 
Da Vinci DM6446 DVS processor. On Figure 9] we show 
the distribution of the 8 video clips we used, each with 
several thousands of frames. 

We present here experimental results run for two dif- 
ferent kinds of DVS processors (see for instance [SJ for 
details about characteristics): a XScale Intel processor 
(with frequencies 150, 400, 600, 800 and 1000MHz), and 
a PowerPC 405LP (with frequencies 33, 100, 266 and 
333MHz). We took frequency change overhead into 
account, but the contribution of change overhead was 
usually negligible for all of the simulations we per- 
formed (lower that 0.1% in most cases). As a third 
CPU, we used the characteristics of XScale, but we dis- 
abled one of its available frequency (400MHz in the 
plots we show here), in order to highlight the advan- 
tage of using our approximation against round up ap- 
proximation when the number of available frequencies 
is quite low. 

5.4 Simulations 

We performed a large number of simulations in order 
to compare the energy performance of "round up" and 
"round to closest". We compare several processor char- 
acteristics, and several job characteristics. We both use 
theoretical models and realistic values extracted from 
production systems. 
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Figure 5 Energy consumption relative to DPM-S 
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Figure 6 Energy consumption relative to PITDVS closest , for a set of 12 tasks with uniformly distribution. 
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For the figures we present here, we simulated 
the same system with different strategies computed 
with variations of Algorithm [3] amongst DPM-S closest 
(Eq. @), DPM-S up (Eq. PITDVS closest (Eq. |g)), 

PITDVS up (Eq. <|3}) and Limit (Algorithm [j], computed 
the energy consumption, and presented the ratio of 
this energy to PITDVS close5t or DPM-S close5t . We then 
performed the same system, but for various deadlines, 
going from the deadline allowing to run any task at 
the lowest frequency (D = S^Li w i)' to tne small- 
est deadline allowing to run any task at the higher 
frequency (D = Si=i w i)- We even used smaller 
deadlines, because this limit represents a frame where 
each task needs at the same time its WCEC, which has a 
very tiny probability to occur. We can consider that de- 
creasing the deadline boils down to increase the load: 
the smaller the deadline, the higher the average fre- 
quency. And quite intuitively, for small and large dead- 
line (or frame length), we don't have any difference be- 
tween strategies, because they all use always either the 
lowest (large deadline) or the highest (small deadline) 
frequency. 

A first observation was that in many cases, the S- 
function of PITDVS up was already almost equal to 
Limit. As a consequence, we could not observe any dif- 
ference between PITDVS up and PITDVS closest . We can 
for instance see this on Figure [6] right plot: for dead- 
lines between 0.1 and 0.06, we don't see any difference 
between PITDVS dosest and Limit. 

In the first set of simulations (Figures [5] and |6j, we 
used 12 tasks, each of them having a uniformly dis- 
tributed number of cycles, with miscellaneous param- 



eters. On the PowerPC processor, we observe a large 
variety in performance comparison. According to the 
load (or the frame length), we see that PITDVS closest 
can gain around 30% compared to PITDVS up , or lose 
almost 20%, while we obtain similar comparison for 
DPM-S closest and DPM-S up , but with smaller values. 

We observe also very abrupt and surprising varia- 
tions, such as in Figure [6j middle and right, for Limit, 
around 0.03. A closer look around to variations shows 
that they usually occurs when the frequency of T% 
changes. Indeed, as T\ starts always at time 0, its speed 
does not really depends upon Si(t), but only upon 
Si(0). So when D varies, <S*i(0) goes suddenly from 
one frequency to another one. Then a very slight varia- 
tion of D could have a big impact of each frame. Those 
slight variations do not have the same impact for other 
tasks, because of the stochastic nature of tasks length. 
For instance, if we slightly change Si (i 7^ 1), it will 
only impact a few task speeds. But slight changes in So 
have either no impact at all, or an impact on every task 
in every frame. 

From those first figures, we can for sure not claim 
that doing a "closest" approach is always better than a 
"upper bound". But those simulations highlight that 
there are certainly situations where one approach is 
better than the other one, and situations with the other 
way around. System designers should then pay at- 
tention to the way they round continuous frequencies. 
With a very small additional effort, we can often do 
better than simply round up the original scheduling 
function. 

For the second set of simulations (using real video 
workloads), on Figures [7] and [§J we observe the same 
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Figure 7 Energy consumption relative to DPM-S closest , for a set of 8 tasks distributed as shown in Figure[9] 
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Figure 8 Energy consumption relative to PITDVS closest , for a set of 8 tasks distributed as shown in Figure|9 
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kind of differences as from the previous experiments: 
according to the configuration, one round method is 
better than the other one. With PowerPC configura- 
tion, PITDVS closest is better than PITDVS up , but DPM- 
S up seems to be better than DPM-S closest . However, with 
the XScale processor where we disabled one frequency, 
both "closest" methods are better than "up" methods. 
Remark that we observe the same kind of benefit by 
disabling another frequency than 400MHz. 

From the many experiments we performed, it seems 
that our approach is especially interesting when the 
number of available frequencies is limited, which is 
not surprising. Indeed, the less available frequency, the 
further from the continuous model. As the two strate- 
gies we adapt where basically designed from continu- 
ous model, and as our adaptation attempts to be closer 
from the original strategy than the classical adaptation, 
we would have expected such behavior. 

We have also observed than "smooth" systems such 
as the one with uniform distribution — but we have 
simulated other distributions such as normal or bi- 
modal normal distribution — do not give smoother 
curves than with the realistic workload, even if sev- 
eral of them contain very chaotic data. The irregu- 
lar behavior of our curve does not seem to be related 
to irregular data, but more to the fact that, as already 
mentioned slight variations in So can have a big im- 
pact on the average energy. In this paper, we do not 
present a huge number of simulations, because we do 
not claim that our approach is always better: what we 
present should be enough to persuade system design- 
ers to have a deeper look at the way they manage dis- 
cretization. 



6 Conclusions and Future Work 

The aim of our work was twofold. First, we presented 
a simple schedulability condition for frame-based low- 
power stochastic real-time systems. Thanks to this con- 
dition, we are able to quickly check that any schedul- 
ing function guarantees the schedulability of the sys- 
tem, even when frequency change overheads are taken 
into account. This test can either be used off-line to 
check that a scheduling function is schedulable, or on- 
line, after some parameter changes, to check whether 
the functions can still be used. 

The second contribution of this paper was to use 
this schedulability condition in order improve the 
way a strategy developed for systems with continuous 
speeds can be adapted for systems with a discrete set 
of available speeds. We show that our approach is not 
always better that the classical one consisting in round- 
ing up to the first available frequency, but can in some 
circumstances, give a gain up to almost 40% in the sim- 
ulations we presented. 

Our future work includes several aspects. First, by 
running much more simulations, we would like to 
identify more precisely when our approach is better 
than the classical one. It would allow system designers 
to be able to choose the approach to use without run- 
ning simulation, or making experiments on their sys- 
tem. 

Another aspect we would like to consider is to have 
a deeper look to how the schedulability test we pro- 
vide will allow to improve the robustness of a system. 
If particular, if we observe that a job has required more 
than its (expected) worst case number of cycles, how 



9 



Figure 9 Distribution of the number of cycles needed to decode different kinds of video, ranging from news 
streaming to complex 3D animations. The x-axis is the number of cycles, and the y-axis the probability. 




can we adapt temporarily our system in order to im- 
prove its schedulability before we can compute the 
new set of functions, using those new parameters. 
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